---
title: Benchmarks
description: Computer Agent SDK benchmarks for agentic GUI tasks
---

The benchmark system evaluates models on GUI grounding tasks, specifically agent loop success rate and click prediction accuracy. It supports both:

- **Computer Agent SDK providers** (using model strings like `"huggingface-local/HelloKKMe/GTA1-7B"`)
- **Reference agent implementations** (custom model classes implementing the `ModelProtocol`)

## Available Benchmarks

- **[ScreenSpot-v2](./benchmarks/screenspot-v2)** - Standard resolution GUI grounding
- **[ScreenSpot-Pro](./benchmarks/screenspot-pro)** - High-resolution GUI grounding
- **[Interactive Testing](./benchmarks/interactive)** - Real-time testing and visualization

## Quick Start

```bash
# Clone the benchmark repository
git clone https://github.com/trycua/cua
cd libs/python/agent/benchmarks

# Install dependencies
pip install "cua-agent[all]"

# Run a benchmark
python ss-v2.py
```
